Effective lane detection on complex roads with convolutional attention mechanism in autonomous vehicles

Autonomous Vehicles (AV’s) have achieved more popularity in vehicular technology in recent years. For the development of secure and safe driving, these AV’s help to reduce the uncertainties such as crashes, heavy traffic, pedestrian behaviours, random objects, lane detection, different types of roads and their surrounding environments. In AV’s, Lane Detection is one of the most important aspects which helps in lane holding guidance and lane departure warning. From Literature, it is observed that existing deep learning models perform better on well maintained roads and in favourable weather conditions. However, performance in extreme weather conditions and curvy roads need focus. The proposed work focuses on presenting an accurate lane detection approach on poor roads, particularly those with curves, broken lanes, or no lane markings and extreme weather conditions. Lane Detection with Convolutional Attention Mechanism (LD-CAM) model is proposed to achieve this outcome. The proposed method comprises an encoder, an enhanced convolution block attention module (E-CBAM), and a decoder. The encoder unit extracts the input image features, while the E-CBAM focuses on quality of feature maps in input images extracted from the encoder, and the decoder provides output without loss of any information in the original image. The work is carried out using the distinct data from three datasets called Tusimple for different weather condition images, Curve Lanes for different curve lanes images and Cracks and Potholes for damaged road images. The proposed model trained using these datasets showcased an improved performance attaining an Accuracy of 97.90%, Precision of 98.92%, F1-Score of 97.90%, IoU of 98.50% and Dice Co-efficient as 98.80% on both structured and defective roads in extreme weather conditions.

• With E-CBAM model, all the lanes in the road image is detected accurately without any loss of original image resolution on filtering the non-lane objects.• To increase the training performance in deep neural network a skip connection is established in residual block and attentive normalization is used to normalize the complex features in terms of representation learning and improved performance.• The proposed model is applied to extreme weather conditions that includes shadows, curves, rain, broken and no lane markings that results to low false positives and good accuracy performance that can ensure the safety for autonomous driving vehicles.
The paper is organised as follows: In Section "Literature survey", a discussion of the most recent lane recognition methods is presented.An elaborate summary of the research technique is given in Section "Proposed LD-CAM Architecture for Lane Detection".In Section "Encoder unit", "Enhanced Convolutional Block Attention Module (E-CBAM)" and "Decoder unit", scenarios, the research procedure and assessment findings of lane detection are presented and analysed.Last section covers the Experiments and Results, followed by Conclusion.

Literature survey
According to the literature, computer vision-based algorithms have been frequently used to address lane detecting challenges.These machine vision-based approaches are broadly classified as classical and Deep learning based.
Yao et al. 22 suggested an enhanced deep neural network (DNN) for attention, which consists of two branches operating at various resolutions and is a lightweight semantic segmentation architecture designed for effective computation in less memory.The suggested network computes dense feature maps for prediction tasks at low resolution in global context.Multiple attention techniques are implemented based on the properties of differential feature resolution qualities.However, the accuracy of the model is not improved.Tabelini et al. 23 provided an innovative approach to lane recognition using deep polynomial regression to produce polynomials that represent each lane marking in the picture from an image captured by a forward-looking camera installed in a vehicle.However, the accuracy of the model has dropped drastically.The model might not have been able to learn this circumstance from less number of samples available since the pictures in those scenarios have a significantly different structure.Zhang et al. 24 enhanced existing semantic segmentation approaches for lane detection and provided a DC-VGG-SAD network (VGG: visual geometry group) that uses dilated convolution (DC) to lower network complexity and guarantee detection accuracy.Moreover, including self-attention distillation (SAD) accelerates the transfer of information.Zhang et al. 25 suggested a spatiotemporal network with double Convolutional Gated Recurrent Units (ConvGRUs) to handle lane detection in difficult scenarios.The locations and functionality of the two ConvGRUs in the network are distinct, although with identical architecture.Moreover, this work detects more information than the marked length leading's to more false positive results.Lu et al. 26 proposed a method known as Scene Understanding Physics-Enhanced Real-time (SUPER) for realtime lane recognition.The suggested approach is divided into two primary modules: a multi-lane parameter optimisation module for lane inference that is strengthened by physics, and a hierarchical semantic segmentation network acting as the scene feature extractor.Identifying complicated lane scenarios is the concern.Lee et al. 27 provided a lightweight UNet for end-to-end learning of path prediction (PP) and lane identification in autonomous driving that uses depthwise separable convolutions (DSUNet).Additionally, convolutional neural network (CNN) incorporated with PP to create a simulation model (CNN-PP) that can be used to evaluate the qualitative, quantitative, and dynamic performance of CNN in a host agent car that is autonomously driving alongside other agents in real time.This research lacks real time implementation when the host is travelling alongside with other agents.Liu et al. 28 provided a reliable lane recognition model for complicated driving scenarios by combining contextual driving information with vertical spatial characteristics.The suggested model can detect unclear and obstructed lane lines more robustly with the two developed blocks, the feature merging block and the information exchange block that makes better use of contextual information and vertical spatial characteristics.However, it does not provide the computational analysis for real time applications and resource constrained systems.Jinfu Chen et al. 29 proposed the AIdetectorX-SP model, which aims to enhance VD capabilities and efficiently recover vulnerability logic aspects by including a Temporal Convolutional Network (TCN) and Self-attention Mechanism (SaM).Overall aim was to create an automatic feature design detection model for software vulnerabilities with improving detection accuracy and lowering false positive and false negative rates.Yassine Oukdach et al. 30 worked on automatic identification of gastrointestinal disorders using video capsule endoscopy (VCE) pictures by fusion of deep learning model with fusion transformers and CNN with attention mechanism to improve the classification performance in VCE images.Deepak Kumar Dewangan et al. 31 carried experiments on detection of speed bumps by applying vision-based convolutional neural network model and SBDNet for autonomous vehicle system.Also, it draws attention to the experiment's learning rate parameter and its analysis, highlighting how crucial it is to the speed bump detection network's functionality.Deepak Kumar Dewangan et al. 32 worked on detection of road conditions in different scenarios such as clear, fog, snow, rain and night roads.In order to segment road regions, the study presents a unique model named DSLnO + FCN, which is based on the VGG-net architecture and takes location prior data, RGB channel, and semantic form into account.Mallikarjun Anandhalli et al. 33 Tresearched to overcome the shortcomings of current techniques by creating a revolutionary vehicle recognition algorithm that can recognize vehicles in a variety of climatic conditions and complex surroundings.They also focussed to localize vehicles using corner sites and group those places to increase the precision of traffic surveillance systems, boosting the overall traffic management and monitoring performance.Mallikarjun Anandhalli et al. 34 carried out work on five essential steps such as corner detection, modeling, tracking and grouping for vehicle recognition.Their proposed algorithms ability is to correlate closely adjacent corner points within vehicular zones, based on Euclidean distance which helps to improve the accuracy of vehicle recognition and tracking.Mallikarjun Anandhalli et al. 35 research focuses on vehicle detection and measuring the vehicle speed which voilates the traffic rules.Also, mapping the geometric plane of the physical environment onto the image plane allows for the calibration of a vehicle's distance covered in both the image plane and the real world, which helps with accurate speed calculation.
According to the literature, several CNN-based techniques have previously been proposed to enhance the result for recognizing lanes in a variety of situations as shown in Table 1.Because to a paucity of training data on terrible roads, however, relatively few research has concentrated on creating simple CNN-based techniques that can predict lanes properly in bad weather and on poorly constructed roads.Moreover, while applying the attention technique, the edge information has not been correctly obtained.The other restriction when employing the attention mechanism is the need for more wasted computational resources.So, in this research, a unique CNN-based lane recognition model is created using a mixed dataset and evaluated in a variety of weather and road conditions.

Ethics approval
The work doesn't require any approval from ethics committee.

Proposed LD-CAM Architecture for Lane Detection
Lane detection accuracy is crucial for autonomous vehicles.However, traditional methods require hand-crafted features and post-processing procedures.Deep Learning models have been proposed for pixel-level lane segmentation, which focus on high accuracy on structured roads and good weather conditions, neglecting unstructured roads like cracked lane lines and curvy ones, and often have complicated architectures.
An LD-CAM architecture for lane detection in unstructured roads is proposed in this work.The preliminary pre-processing module focuses to reduce the size of the input image thereby reducing the memory and applies linear interpolation to estimate the missing data in the dataset.This helps to speed up the process.Further, the RGB colour image is transformed into gray-scale image.
The pre-processed image is then fed to LD-CAM architecture which consists of an encoder, Enhanced Convolutional Block Attention modules (E-CBAM), and a decoder.The encoder extracts essential features from the pre-processed image and builds low-level (edges, corners, textures) and high-level (lane boundaries, lane markings, lane curvature, lane transitions, and road geometry) feature maps.The features created by the encoder section is transmitted through the attention module block which focuses on the main features such as lane marking, lane transitions and lane curvature to extract these features on road region.The decoder module reconstructs the feature maps collected from the unique attention module to provide the predicted images with the exact resolution as the input images.Figure 1 depicts the general framework of proposed LD-CAM Architecture.In the next sections, the architecture of every component of the model is briefly examined.

Encoder unit
The encoder module consists of four residual blocks and four strided convolution layers.Each residual block of the encoder module consists of three convolutional layers, three normalisation layers, and three activation functions which assures reliable feature extraction and transformation.The four strided convolution layers of the encoder module takes care of down sampling and concentrates on the pre-processed image features with minimal loss.The input feature maps are normalised using attentive normalisation (AN), which is an essential step in stabilising and improving the training procedure.Gaussian Error Linear Unit (GELU) is used as the activation function for convolutional layers, which combines the positive aspects of Rectified Linear Units (ReLUs) with dropout.The issue with ReLU is that, when the input lane image of a neuron in the network has a negative value, it is considered as zero.Hence, to overcome this, the proposed work has opted GELU activation function for smoother flow and to avoid overfitting problems.The construction of the residual blocks is displayed in Fig. 2.
Using a filter size of 1 × 1, the first convolutional layer in the residual block produces feature maps with a dimension of 'n' , which is the number of feature maps generated by this convolutional layer in the residual block.The 1 × 1 convolution acts as a pointwise convolution in this block and enables the model to modify the   where, l is the input to the GELU function, tanh is the hyperbolic tangent function, 2 π , 0.044715 are constant factor.A smooth, differentiable function GELU is useful for gradient-based optimization and training.Because of its zero-centeredness, the vanishing gradient problem is lessened and neural network convergence is aided 36 .In contrast to the hyperbolic or sigmoid tangent functions, GELU does not reach saturation at higher input values.Further, a 3 × 3 filter size namely the second convolutional layer is used to transfer the extracted features from the GELU.The 3 × 3 convolutional layer captures hierarchical characteristics in the input data, unlike the first 1 × 1 convolution, which is mainly concerned with altering channel-wise information and collecting complicated relationships.The second convolutional layer allows the model to capture fine-grained features and local relationships in the data by using the 3 × 3 filter.The Residual block identifies and considers patterns which encompasses many pixels due to its higher filter size, thereby enhancing the recognition of deeper patterns in the input.To achieve a balance between collecting local and global characteristics, it is typical to employ many convolutional layers in a residual block with different filter sizes.Within the residual block, the last convolutional layer uses a 1 × 1 filter with a size of 'n*2.' This method of applying 1 × 1 convolutions helps control the computational complexity while preserving the ability to detect complicated patterns in the data.
The skip connection is proposed to tackle the problem of disappearing gradients.The output from the residual block 1 that comes after the GELU layer is integrated and sent as the input to the second residual block and so on through this skip connection.For the gradients during backpropagation, this creates a straight channel for the skip connection to mitigate the vanishing gradient problem, ensuring that the model can efficiently learn from the input data.The skip connection effectively functions as a shortcut, allowing the gradient flow through the deep learning model to be smoother.Once the output from residual block has been passed through the strided convolution layer, it is down-sampled by a factor of two times the size of the input feature map.At the final encoder block, the downsampled and altered representation of the input data is reflected in a feature map with dimensions of H/16 × W/16 × C. Finally, it extracts the low level and high level features and fed into the E-CBAM module.

Enhanced convolutional block attention module (E-CBAM)
A critical processing step is carried out by the refined activation maps through Enhanced-Convolutional Block Attention Modules (E-CBAMs), which preserve the same size as the intermediate activation maps acquired from the encoder stages.These intermediate activation maps come from four phases, each of which represents a particular stage of the model's feature extraction process.To improve the characteristics of the input feature maps, E-CBAMs must be processed together with these maps.The intermediate activation maps' spatial and channel-wise information has been preserved in the refined activation maps that result from the relationship with E-CBAMs.The overall quality of the feature representations is improved by the model's ability to selectively pay attention to and emphasize important characteristics through this approach.The following Fig. 3 shows E-CBAM Architecture.

Enhanced channel attention module
The primary goal of the enhanced channel attention module (ECAM) is to dynamically strengthen features that transmit important information while suppressing those that are less significant.The channel attention module uses global data to instruct the network's architecture from the intermediate map and helps to recognize the significance of different channels in the feature maps from the intermediate features.The conventional CBAM block cannot identify edge information or the road border when lane markings have completely disappeared from the road or when a lane's boundary is not visible.Sometimes there is a waste of computing resources and parameter build-up with the typical channel attention module.In this research, the MLP layer is eliminated from the channel attention module, and replaced with a 1D convolution layer to learn and localize significant channel-wise features immediately following average pooling.Subsequently, the outputs undergo processing using a sigmoid() and multiplicity of the intermediate feature to produce the channel-improved output features as shown in Eq. ( 2) where, M ECA (I) is the C-th channel attention map, σ represents the sigmoid function, which maps its input to a value between 0 and 1. (1D k ) stands for a 1D convolutional procedure with a k-sized kernel.The average-pooled feature map is subjected to this convolutional procedure, which captures patterns and relationships among channels and I c avg indicates the average intermediate feature map from the channel part (C).Input for the Enchanced Spatial Attention Module (ESAM), a channel-refined feature is discussed below.

Enhanced spatial attention module
The output of channel-refined features from the ECAM is sent to ESAM.The spatial dimension information of the input feature maps is focused and compressed using maxpool and avgpool in "Enhanced Spatial Attention Module".It then goes through a sharpening filter to enhance the information on edge features.Sobel Sharpening filters are used in the "Enhanced Spatial Attention" approach to increase spatial attention in the context of road lane detection.The ECAM of the CBAM is comparable to the ESAM module, which focuses on determining the optimum position and intensity of the road lane borders.The objective is to enhance the detection accuracy of road lane margin.Once the mapping feature channel is narrowed, the pooled output is subjected to a sharpening filter with weighted assignments.Further, the convolution layer reduces the feature map's channel and passed through a sigmoid activation function.The product of the sigmoid function with channel refined feature is obtained to get the spatial refined feature maps.The process allows the system to make improvements and adjustments to the highway lane margin data.This is Mathematically presented in Eq. (3).
where, σ signifies the sigmoid activation function, f n X n indicates sharpening filter, which is used to improve the concatenated feature maps' spatial details with a filter dimension of n x n, and P 1X 1 is a 1*1 pointwise convolu- tion applied to the sharpened feature maps.I S avg and I S max is the average intermediate feature map and maximum intermediate feature map from the spatial part.I S avg * I S max denotes the concatenation of the average-pooled and max-pooled feature maps along the channel dimension.The channel attention module's average pooling operations make the process scale-invariant by capturing contextual information regardless of the amount of the input.Working with fixed-size pooling feature vectors, the 1D convolutional layer is flexible enough to function at any resolution.Because of its weight-sharing and local receptive fields, convolutional filters applied to spatial www.nature.com/scientificreports/maps in the spatial attention module can accommodate variable spatial dimensions.Re-scaling element-wise multiplication enables CBAM to successfully improve feature representations at various scales and resolutions by matching attention maps with input feature maps regardless of resolution.Finally, the E-CBAM refined features and the last set of Stride convolution layers of features from the encoder module are then transferred to the decoder module will be discussed in the upcoming section.In this work, the 3 × 3 first order sobel sharpening to the utilization of the extracted features' edge data is reinforced.

Decoder unit
The decoder unit in this architecture is a crucial component comprising concatenation, activation functions, deconvolutional blocks which are similar to encoder, and up-sampling layers.Each de-convolutional block consists of three consecutive conv2D transpose layers, all containing the same number of filters (1 × 1, 3 × 3 and 1 × 1).Following these convolutional layers, Attention normalization is applied, and three GELU activation functions are sequentially utilized to introduce non-linearity to the processed data.The process begins with the gathering of outputs from each E-CBAM block, including E-CBAM-1, E-CBAM-2, E-CBAM-3, and E-CBAM-4, which are combined with a 2 × 2 up-sampling layer in the decoder.The concatenation procedure specifically pairs outputs from E-CBAM-1 and E-CBAM-2, as well as E-CBAM-3 and E-CBAM-4, ensuring that no features are lost during this integration step.To construct two activation maps, the concatenated outputs are then carefully combined, emphasizing the effective integration of both low-level and high-level features.Notably, E-CBAM-2 and E-CBAM-4 undergo a 2 × 2 up-sampling process before concatenation to match the form of E-CBAM-1 and E-CBAM-3.Following this, three parallel de-convolutional blocks are employed, taking inputs of 2 × 2 upsampling, E-CBAM-1, and E-CBAM-2.The necessity to recover the original features from each encoder block is the driving force behind the use of 2 × 2 up-sampling in this stage.The de-convolutional blocks further receive inputs up-sampled to 2 × 2, 4 × 4, and 16 × 16 to maintain consistency with the proportions of the original input images.This consistency ensures that the reconstructed features align with the initial input structure.Creating a single map that combines the minimal, excellent, and refined feature levels is the final phase.Concatenating all three feature maps yields this result.Next, a 1 × 1 convolution with a single filter is used to create a predicted image.The whole architecture is made to collect and aggregate data at several scales in order to provide a deeper and more accurate prediction.

Experiments and results
Experiment to assess the results and robustness of the proposed framework was conducted and the evaluation metrics of the proposed model was analysed.Estimating lanes from road images in various scenarios such as damaged road, different weather conditions and curvy roads were recorded.

Implementation details
In this work, three datasets are utilized, such as TuSimple (different weather conditions images), curve lanes (different curve lanes images) and cracks and potholes in road images (damaged road images), to train the proposed model.The suggested method has been trained to retain essential spatial information for accurate prediction.Around 6,408 road images were obtained from TuSimple dataset on US highways 37 .The image has a resolution of 1280 × 720.From the available dataset, 3,626 images were used for for training, 358 images for validation, and 2,782 images for testing under different weather conditions.CurveLanes 38 is a brand-new benchmark lane detection dataset with 150 K lanes images for challenging traffic lane detection scenarios including curves and multi-lanes.It is gathered in actual urban and highway settings throughout many Chinese cities.It sets a more difficult bar for the community and is the largest lane detection dataset till date.We divided the entire 150 K dataset into three parts: 100 K for training, 20 K for valuation, and 30 K for testing.Most of the images in this dataset have a resolution of 2650 × 1440.Each image's lanes are manually annotated with natural cubic splines and the selected images contain curvy lane.There are more challenging situations including S-curves, Y-lanes, night and multi-lanes in above dataset.The cracks and potholes in road images dataset 39 contain more than 5,000 images of potholes on actual roads taken in the wild.These photographs have a high resolution of 1920 × 1080 or above and were taken utilising their crowd sourcing platform from more than 2000 different places.From the available dataset 56 percentage is used for training and 44 percentage is used for testing on applying train-testsplit technique from the scikit-learn package as shown in Fig. 4. The deep learning framework called Tensorflow version 2.12.0 and keras verison 2.12.0 is used in this work and deployed in Jupiter Notebook with Python 3.9 platform.Proposed model was trained using the Stochastic Gradient Descent (SGD) optimizer, a batch size of 10, a learning rate of 5e-5, and 50 epochs.The model was trained and experiments were carried out with Windows 11 pro, 32 GB of RAM and an Intel Core i9-12,900@ 3.20 GHz HP Z2 G9 SFF GPU processor.
The SGD Optimizer was chosen in this work to improve the performance of the adaptive learning rates and handle large datasets efficiently 40 .This optimizer speeds up the training process by updating model parameters based on a random selection of the training process which will reduce the processing time for the overall attention based encoder-decoder architecture 41 .The steps involved for SGD optimizer are shown below; • Sort the training dataset in a random order.
• Split the dataset into smaller groups.
• Determine the gradient of the loss function with respect to • the model parameters for every mini-batch.
• Scaled by a learning rate, update the model parameters in the gradient's opposite direction.

Performance results
Validating the efficiency of the mixed dataset and contrasting it with other standard methods is one way to assess its resilience and accuracy.As the goal of the proposed model is to distinguish between "Lane" and "Non-Lane" pixels, measures such as pixel accuracy, Dice-coefficient, Intersection over Union (IoU), precision, recall and F1 score was considered to evaluate it 42 .In the case of a binary segmentation task, pixel accuracy is defined as the percentage of properly categorised pixels.Pixel accuracy cannot be the greatest metric for evaluating the segmentation task, nevertheless, owing to the problem with class imbalance.Most of the Non-lane pixels in the dataset for lane recognition are highly out of balance in the images.As both of these are based on the area of overlapping among the real picture and the anticipated image, the Dice coefficient and the IoU are considered to be more effective metrics for segmenting the lane path to attain the best lane detection.Both metrics have significance for precise borders, shapes, and in assessing small objects in segmentation task because they are efficient in quantifying overlap, managing class imbalances, and being sensitive to partial matches.The Dice coefficient and IoU 43 is derived from the Eq. ( 4) and ( 5) respectively.
On dividing the two overlapping regions by the total number of pixels, it demonstrates the Dice coefficient and IoU simultane-ously shows the union region that separates the anticipated visuals from the ground truth.
(4) DiceCoefficient = 2 x predicted x groundtruth x predicted + x groundtruth (5) IoU = x predicted x groundtruth x predicted + x groundtruth − x predicted x groundtruth However, the proposed model managed to account for this variance, and beginning with the subsequent epochs, the curves stabilised another time.In Fig. 5b,c, the accuracy starts to rise after the 15th epoch and remains stable throughout until final epochs.It is concluded from the loss graph that the train and test loss is maintained stable even after the increase in epochs.
Based on the parameters chosen for the test set, Figure 6a-d and Table 2 presents the results obtained with existing methods and the proposed method.Significant findings are shown by comparing the performance of several models, such as U-net, Encoder-decoder SegNet VGG16, SCNN Unet Light ConvGRU, SCNN Unet Light ConvLSTM, SHPDAN (Spatial Hierarchical Dilated Attention Network) and the proposed LD-CAM architecture.U-net achieves an F1 score of 87.7% while having a comparatively low accuracy of 79% and a high recall of 98.5%.This indicates that although U-net has a greater proportion of false positives, it is also strong at finding actual positives.The SegNet VGG16 encoder-decoder has a well-balanced performance, and excellent overall efficacy in maintaining low false positive and false negative rates, with accuracy of 96.61%, precision of 98.91%, recall of 93.89%, and F1 score of 96.3%.The F1 scores of 90.2% and 88%, respectively, for the SCNN Unet Light ConvGRU and SCNN Unet Light ConvLSTM models indicate that they perform better in terms of recall 96% and 95%, respectively, but less in terms of accuracy as 85% and 83%, when compared to the encoder-decoder SegNet VGG16 and SHPDAN has accuracy as 96.04%.Although these models show an excellent balance, they do not achieve peak performance.Nevertheless, with an accuracy of 97.90%, precision of 98.92%, recall of 98.8%, and an F1 score of 97.70%, the proposed LD-CAM architecture performs better than the other existing models.This results demonstrate LD-CAM's sophisticated ability to precisely detect essential characteristics while reducing false positives and negatives.The model is the most efficient among those tested because of its high accuracy and recall, while it demonstrates how well it maintains a great balance.The results clearly present that the proposed approach performs superior compared to the methods on all measures.This suggests that proposed model will be more adaptable when used to create a real-world self-driving vehicle that recognises lanes in real-time to avoid accidents.Figure 7 depicts the overall performances of the proposed method and the graph reveals that accuracy attained 97.90%, IoU of 98.50%, Precision of 98.92%, F1 Score of 97.70%, Recall of 98.80% and Dice coefficient as 98.80%.Some visualisation results are provided that was obtained using the proposed approach to test the resilience of proposed model.The curvy road conditions are preferred for this experiment because properly detecting damage on a curved road is one amongst the most difficult tasks for existing algorithms.
Figure 8a,b depicts the original images and the associated anticipated images, with lanes highlighted in red at the curvy of the road stated below.
Poor pavement conditions are observed in many roads in developing countries.Also, roads are covered with huge holes and various cracks.Furthermore, the lane delineation is hazy and disappears from time to time.In this investigation, the aforementioned situation was analysed and the proposed model was challenged to anticipate lanes in such bad conditions.Figure 9a,b depicts the actual and projected frames by proposed model in the presence of faulty pavement conditions.The given images were employed to choose a route that had several openings and cracks in the roadways.The road has lost even its lane markers.However, the proposed  www.nature.com/scientificreports/algorithm accurately anticipated the image's lane margins, which is excellent enough for a car to operate safely even in adverse conditions.Stochastic Gradient Descent (SGD) optimizer, and Gaussian Error Linear Unit (GeLU) activation function was applied.The lanes on these roadways are adequately delineated, and the roadway is in good shape.For this experiment, ideal weather and lighting conditions are chosen and an experiment was carried out is shown in Fig. 10a,b.Figures 8a and 10b shows that the proposed model estimated the lanes effectively in all circumstances.As a result, when compared to the results of other models, the proposed framework predicted the lane and performed better in terms of accuracy, Dice loss, Dice coefficient, and IoU.

Conclusion
In this article, a CNN based encoder-decoder model is used for precise lane detection based on attention mechanism.The proposed strategy achieved effective and accurate lane recognition especially under challenging circumstances because of the accurate data collected by the attention modules.The evaluation showed that the proposed approach could achieve greater Dice Score, Accuracy, IoU, Precision, F1 score, and Recall when compared to the existing methods.Additionally, the recommended strategy demonstrated its robustness in a variety of difficult settings by producing great qualitative results.Although this technique may be used on both structured and unstructured roads, it performs well on unstructured roads when there are significant damages to the road and no lane markers are available in various weather situations.Though the LD-CAM model shows promising results its computational complexity may increase due to introduction of enhanced convolutional attention mechanism (E-CBAM), while deployed in low power embedded systems commonly used in autonomous vehicles.Future work will focus on optimizing the E-CBAM to reduce computational overhead while maintaining high detection accuracy.Additionally, although the model performs well in extreme weather conditions, its performance under varying lighting conditions, such as sudden transitions from sunlight to shadows, needs further investigation.Incorporating adaptive illumination techniques or additional preprocessing steps 14:19193 | https://doi.org/10.1038/s41598-024-70116-zwww.nature.com/scientificreports/

Figure 4 .
Figure 4. Overview of Evaluation Process.

Figure 5 .
Figure 5. Curve of Loss, DC and IoU while Training and Validation at 50 epochs.

Figure 6 .
Figure 6.Comparison of various performance metrics of proposed with existing approaches.

Figure 7 .
Figure 7. Performance metrics of the proposed method.

Figure 8 .
Figure 8. Lane detection in curve lane conditions.

Figure 9 .
Figure 9. Lane detection in damaged road condition.

Figure 10 .
Figure 10.Lane detection in different weather conditions.

Table 1 .
Comparision table for Literature interms of methods, Limitations and Performance measures.

Table 2 .
Model performance comparison.